Predicting F0 and voicing from NAM-captured whispered speech
نویسندگان
چکیده
The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F0 of the transformed voice from unvoiced speech. In this paper, we propose a method to improve F0 estimation and voicing decision in a NAM-to-speech conversion system based on Gaussian Mixture Models (GMM) applied to whispered speech. Instead of combining voicing decision and F0 estimation in a single GMM, a simple feed-forward neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments. The error rate for the voiced/unvoiced decision of the network is 6.8% compared to 9.2% with the original system. Our proposal benefits also to F0 estimation error.
منابع مشابه
Accepted Manuscript Improvement to a Nam-captured Whisper-to-speech System Improvement to a Nam-captured Whisper-to-speech System
Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and beca...
متن کاملImprovement to a NAM captured whisper-to-speech system
Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and beca...
متن کاملPerception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin
Whispering is commonly used when one needs to speak softly (for instance, in a library). Whispered speech mainly differs from neutral speech in that voicing, and thus its acoustic correlate F0, is absent. It is well known that in tonal languages such as Mandarin, tone identity is primarily conveyed by the F0 contour. Previous works also suggest that secondary correlates are both consistent and ...
متن کاملVoicing assimilation in whispered speech
A large body of literature has shown that phonemic voicing contrasts are preserved in the production and perception of whispered speech. Nevertheless, it is unclear to what extent allophonic voicing is also maintained in whisper. The present study investigates whether a non-contrastive voicing distinction in Spanish fricatives – which results from voice assimilation in obstruent clusters – is a...
متن کاملAerodynamic and durational cues of phonological voicing in whisper
This paper presents analyses on the phonological voicing contrast in whispered speech, which is characterized by the absence of vocal fold vibrations. In modal speech, besides glottal vibration, the contrast between voiced and unvoiced consonants is realized by other phonetic correlates: e.g. consonant and pre-consonantal vowel durations, intraoral pressure differences. The analysis of these vo...
متن کامل